These were assessed in IWMD gene panels.Rmd
ML_predictors <- read_csv("/home/MinaRyten/Kylie/Functional_genomic_annotation_IWMD/raw_data/ML_predictors_input.csv")
ML_predictors %>% dplyr::select(Predictor) %>% as_tibble()# Load genic features
# Without missing values
load_predictors <- read_csv("/home/MinaRyten/Kylie/Functional_genomic_annotation_IWMD/raw_data/updated_predictor_matrix_w_ataxia_genes.csv") %>%
select(-c(161:180, 228, 230, 232, 233, 275, 315, 317))
load_predictors <- load_predictors %>%
rename_with(~ "exon_width", .cols = matches("^exon_width\\.y$")) %>%
rename_with(~ "intron_width", .cols = matches("^intron_width\\.y$")) %>%
rename_with(~ "ensembl_gene_id", .cols = matches("^gene_id\\.x$")) %>%
rename_with(~ "width", .cols = matches("^width\\.x$"))
which(str_detect(colnames(load_predictors), fixed("width.x")))## [1] 209
Note these are for the 17,635 genes from PanelApp: Function G2PML::fromGenes2MLData(genes=genes,which.controls=“allgenome”) where genes refers to derivation from function getGenesFromPanelApp. For ML predictor, would need to consider whether should extend beyond 17,635 genes in total (Ensembl v.72). Control gene set doesn’t include Neurology and Neurodevelopmental disorders panel. ## To save re-running, upload re-run one for mitochondrial genes as below therefore, ignore “condition”
# Table of all G2PML predictors for IWMD panel genes
combined_gene_set_panelapp_modified <- readRDS("/home/MinaRyten/Kylie/Functional_genomic_annotation_IWMD/raw_data/combined_gene_set_panelapp_modified.rds")
load_predictors %>% as_tibble()